Search Algorithms

Problem Definition:

So far, we never considered the cost to arrive at cities. We always choose a node only on FIFO or LIFO, not on which node might be next best candidate. Doesn't taking in to account node credibility make much more sense? That is basis for our UCS.

In our map, we already have const information to move from one node to another connected node. We also earlier showed how we calculated that using geodesic method.

In [1]:
from docHelpers_ipython import Doc, romania_location_map_full
from IPython.display import display, Image

romania_location_map_open_full = {
    'Arad' : {'pos': (21.31227,46.18656), 'connections': ['Sibiu','Timisoara','Zerind'] },
    'Sibiu' : {'pos': (24.12558,45.79833), 'connections': ['Fagaras','Rimnicu Vilcea'] },
    'Zerind' : {'pos': (21.51742,46.62251), 'connections': ['Oradea'] },
    'Timisoara' : {'pos': (21.20868,45.74887), 'connections': ['Lugoj'] },
    'Oradea' : {'pos': (21.91894,47.04650), 'connections': [] },
    'Fagaras' : {'pos': (24.97310,45.84164), 'connections': ['Bucharest'] },
    'Lugoj' : {'pos': (21.90346,45.69099), 'connections': ['Mehadia'] },
    'Rimnicu Vilcea' : {'pos': (24.36932,45.09968), 'connections': ['Craiova','Pitesti'] },
    'Mehadia' : {'pos': (22.36452,44.90411), 'connections': ['Dobreta'] },
    'Dobreta' : {'pos': (22.65973,44.63692), 'connections': [] },
    'Craiova' : {'pos': (23.79488,44.33018), 'connections': [] },
    'Pitesti' : {'pos': (24.86918,44.85648), 'connections': [] },
    'Bucharest' : {'pos': (26.10254,44.42677), 'connections': ['Giurgiu','Urziceni'] },
    'Giurgiu' : {'pos': (25.96993,43.90371), 'connections': [] },
    'Urziceni' : {'pos': (26.64112,44.71653), 'connections': ['Hirsova','Vaslui'] },
    'Vaslui' : {'pos': (27.72765,46.64069), 'connections': ['Lasi'] },
    'Lasi' : {'pos':(27.60144,47.15845), 'connections': ['Neamt'] },
    'Neamt' : {'pos': (26.38188,46.97587), 'connections': [] },
    'Hirsova' : {'pos': (27.94566,44.68935), 'connections': ['Eforie'] },
    'Eforie' : {'pos': (28.65273,44.04911), 'connections': [] }
}

doc = Doc(romania_location_map_open_full)
mappy = doc.showMap()
display(Image(mappy))

The problem statement: find a route from Arad to Bucharest

Problem Setup - Cost Approach

Since we pick up any node in the bag based on cost associated with it, we need to also store the cost along with it. We need a better queue mechanism to handle which we will see shortly.

Further cost associated with each node, is the total cost from start to that node. This we would calculate dynamically as we go on exploring each node.

That is, every time we add a new node to the queue we will also now include

new_cost = total_cost_from_start_to_that_node

Let us denote new_cost by gScore mathematically g(n)

Just also recall our earlier method we used to calculate cost. This would be like step cost between 2 nodes which we can now use to calculate total cost g(n) (total cost as sum of step costs)

In [2]:
import geopy.distance
from math import floor
def calculate_GD(start, end):

    (x0,y0) = start
    (x1,y1) = end
    return floor(geopy.distance.geodesic((y0,x0), (y1,x1)).miles)

M = romania_location_map_open_full
cost = calculate_GD(M['Arad']['pos'],M['Sibiu']['pos'])  #calculating between Arad and Sibiu
cost
Out[2]:
138

Now we will use above function slightly differently. When we add each node, we will just pass on node information to the function above, and get the distance value. We shall also rename this function to 'distance'.

In [3]:
import geopy.distance
from math import floor
def distance(start_node, end_node):

    (x0,y0) = M[start_node]['pos']
    (x1,y1) = M[end_node]['pos']
    return floor(geopy.distance.geodesic((y0,x0), (y1,x1)).miles)

M = romania_location_map_open_full
cost = distance('Arad','Sibiu')  #calculating between Arad and Sibiu
cost
Out[3]:
138

Priority Queue basics

Before we start our manual iterations, trying out this new strategy, we have to get used to a special queue called priority queue, because, this is an efficient python bag we have, where when we store nodes, we could also store the gScore associated with that node, as priority, and this is the interesting part: whenever we pop the queue, we get the one with least cost/highest priority (highest priority is one having lowest number associated with it, yeah, its kinda confusing).

1. Creating a simple queue and checking if its empty

In [4]:
from queue import PriorityQueue
q = PriorityQueue()
q.empty()
Out[4]:
True

2. Storing and retrieval of simple data

In [5]:
q.put((3, 'python'))
q.put((4,'java'))
q.put((2, 'ruby'))
q.put((2,'golang'))  #Note I am also testing what happens when 2 or more elements have same prio
list(q.queue)
Out[5]:
[(2, 'golang'), (2, 'ruby'), (3, 'python'), (4, 'java')]
In [6]:
q.get() 
Out[6]:
(2, 'golang')
In [7]:
q.get()
Out[7]:
(2, 'ruby')

Note: Calling q.get() or q.get(False) when q is empty, makes ipython kernel hang. Always check if Q is empty before accessing

In [8]:
while not q.empty():
    print(q.get())
(3, 'python')
(4, 'java')

Only above two because we already retrieved lower priority elements earlier with get() method. We could put all at once, and try retrieving via loop.

In [9]:
q = PriorityQueue()

q.put((3, 'python'))
q.put((4,'java'))
q.put((2, 'ruby'))
q.put((2,'golang'))  

while not q.empty():
    print(q.get())
(2, 'golang')
(2, 'ruby')
(3, 'python')
(4, 'java')

Our storage could look something like below..

In [10]:
q = PriorityQueue()

q.put((0, 'A'))     # 0 cost to arrive at start (we consider A as start, costs always relative to start node)
q.put((138,'S'))    # 'A' to 'S' costs 138
q.put((179, 'F'))   # 'A' to 'S' to 'F' costs 138+41 = 179 and so on..
q.put((30, 'T'))    # 'A' to 'T' costs 30

while not q.empty():
    print(q.get())
(0, 'A')
(30, 'T')
(138, 'S')
(179, 'F')

gScore

Now that priority queue is clear, let us recall gScore. The idea to calculate gScore is this:

When we are at any node, we will know the cost of that current_node from root (assume for now). We can calculate distance between current node and neighbor. Then we could simply use below formula.

gScore = 'cost till current_node' + 'cost between current node and its neighbor'

Whenever we add a node to the priority queue (shortly pQ), we will always add with the cost with that node. That is the cost from start to that current node. When we pop the queue, we automatically get that value. In fact, when we pop the pQ, we get a tuple: (priority, value), in our case (cost, node). It is this information we could make use of in our algorithm. It would be something like below

#unlike earlier in BFS/DFS, this time when we pop, we get 2 values..
current_cost, current_node = openSet.get()

# and then when we add a node to openSet pQ, we could calculate gScore as below
gScore = current_cost + distance(current_node, each_neighbor)

First iteration will clear this up, if it is still not clear.

Algorithm Development

Let us initialize as usual...hope you are now familiar with below constructs already from earlier sessions.

Note this time, our openSet is a priority queue, so we could store the cost information as well. And cost information is always relative to start. So for 'A' it would be 0. No cost to going where you are already!

openSet = { (0,'A') }  
closedSet = ( 'A' )
cameFrom[ 'A' ] = None

Coding..

In [11]:
from queue import PriorityQueue
from docHelpers_ipython import romania_location_map

cameFrom = { } 
openSet = PriorityQueue()
M = romania_location_map
start = 'A'
goal = 'B'

openSet.put((0,start))  
closedSet = set(start)
cameFrom[start] = None

ITERATION 1

Is openSet empty? No. So Go on.
Take least cost node from openSet
{(0, 'A')} : (0, 'A')
Current node: 'A'. Kids: 'S','T','Z' (now order doesn't matter unlike earlier in BFS or DFS)
Current_cost: 0
Is 'A' the goal? No. So Go on.

                        closedSet : ( 'A' ) already

'A' to 'S':
new cost from 'A' to 'S' : 0 + 138 = 138
closedSet : ( 'A', 'S' ) 'A' to 'T':
new cost from 'A' to 'T' : 0 + 30 = 30 closedSet : ( 'A', 'S', 'T' ) 'A' to 'Z':
new cost from 'A' to 'Z' : 0 + 31 = 31
closedSet : ( 'A', 'S', 'T', 'Z' )

Result: openSet = { (138,'S'), (30,'T'), (31,'Z') }


Coding..

In [12]:
if not openSet.empty():   # Note unike deque, we check differently using empty() if our bag is empty or not

    current_cost, current_node = openSet.get()   # Note, we 'get()' now and it returns a 'set', a tuple

    if current_node is not goal:     

        all_neighbors = M.get(current_node,[]).get('connections',[]) 
        for each_neighbor in all_neighbors:   # each neighbor is a 'set' having a cost and node itself
            if each_neighbor not in closedSet: 

                #gScore = 'cost till current_node' + 'cost between current node and its neighbor' 
                gScore = current_cost + distance(current_node,each_neighbor)

                openSet.put((gScore, each_neighbor))                  
                cameFrom[each_neighbor] = current_node    
                closedSet.add(each_neighbor)

list(openSet.queue)
Out[12]:
[(30, 'T'), (138, 'S'), (31, 'Z')]

Visualizing..(do not bother about order of added nodes)

In [13]:
from docHelpers_ipython import Doc
from IPython.core.display import HTML

doc = Doc(M)

resultHTML = doc.computeGraphs('A',['T','S','Z'], mappy=True, tree=True, queue=True, HTML=True)
HTML(resultHTML)
Out[13]:
Red dot
Red dot
Red dot

ITERATION 2

Is openSet empty? No. So Go on.
Take least cost node from openSet
{(30, 'T'), (138, 'S'), (31, 'Z')} : (30, 'T')
Current node: 'T'. Kids: 'LU'
Current_cost: 30
Is 'T' the goal? No. So Go on.

                        closedSet : ( 'A', 'S', 'T', 'Z' )

'A' to 'LU': 'A' to 'T' + 'T' to 'LU'
new cost from 'A' to 'LU' : 30 + 33 = 63
closedSet : ( 'A', 'S', 'T', 'Z', 'LU' )

Result: openSet = { (138, 'S'), (31, 'Z'), (63,'LU') }


In [14]:
if not openSet.empty():   # Note unike deque, we check differently using empty() if our bag is empty or not

    current_cost, current_node = openSet.get()   # Note, we 'get()' now and it returns a 'set', a tuple

    if current_node is not goal:     

        all_neighbors = M.get(current_node,[]).get('connections',[]) 
        for each_neighbor in all_neighbors:   # each neighbor is a 'set' having a cost and node itself
            if each_neighbor not in closedSet: 

                #gScore = 'cost till current_node' + 'cost between current node and its neighbor' 
                gScore = current_cost + distance(current_node,each_neighbor)

                openSet.put((gScore, each_neighbor))                  
                cameFrom[each_neighbor] = current_node    
                closedSet.add(each_neighbor)

list(openSet.queue)
Out[14]:
[(31, 'Z'), (138, 'S'), (63, 'LU')]
In [15]:
resultHTML = doc.computeGraphs('T',['LU'], mappy=True, tree=True, queue=True, HTML=True)
HTML(resultHTML)
Out[15]:
Red dot
Red dot
Red dot

ITERATION 3

Is openSet empty? No. So Go on.
Take least cost node from openSet
{(31, 'Z'), (138, 'S'), (63, 'LU')} : (30, 'T')
Current node: 'Z'. Kids: 'O'
Current_cost: 31
Is 'Z' the goal? No. So Go on.

                        closedSet : ( 'A', 'S', 'T', 'Z', 'LU' )

'A' to 'O': 'A' to 'Z' + 'Z' to 'O'
new cost from 'A' to 'O' : 31 + 34 = 65
closedSet : ( 'A', 'S', 'T', 'Z', 'LU', 'O' )

Result: openSet = { (138, 'S'), (63, 'LU'), (65,'O') }


In [16]:
if not openSet.empty():   # Note unike deque, we check differently using empty() if our bag is empty or not

    current_cost, current_node = openSet.get()   # Note, we 'get()' now and it returns a 'set', a tuple

    if current_node is not goal:     

        all_neighbors = M.get(current_node,[]).get('connections',[]) 
        for each_neighbor in all_neighbors:   # each neighbor is a 'set' having a cost and node itself
            if each_neighbor not in closedSet: 

                #gScore = 'cost till current_node' + 'cost between current node and its neighbor' 
                gScore = current_cost + distance(current_node,each_neighbor)

                openSet.put((gScore, each_neighbor))                  
                cameFrom[each_neighbor] = current_node    
                closedSet.add(each_neighbor)

list(openSet.queue)
Out[16]:
[(63, 'LU'), (138, 'S'), (65, 'O')]
In [17]:
resultHTML = doc.computeGraphs('Z',['O'], mappy=True, tree=True, queue=True, HTML=True)
HTML(resultHTML)
Out[17]:
Red dot
Red dot
Red dot

Ok, its testing patience after iteration 3 already. Let us wrap it in our loop and see how it ends? Don't forget to notice the change about exiting the loop once you find the node

In [18]:
from queue import PriorityQueue
from docHelpers_ipython import romania_location_map

def distance(start_node, end_node):

    (x0,y0) = M[start_node]['pos']
    (x1,y1) = M[end_node]['pos']
    return floor(geopy.distance.geodesic((y0,x0), (y1,x1)).miles)


# ALGORITHM

cameFrom = { } 
openSet = PriorityQueue()
M = romania_location_map
start = 'A'
goal = 'B'

openSet.put((0,start))  
closedSet = set(start)
cameFrom[start] = None

while not openSet.empty():   # Note unike deque, we check differently using empty() if our bag is empty or not

    current_cost, current_node = openSet.get()   # Note, we 'get()' now and it returns a 'set', a tuple

    if current_node is goal:
        print('Yes, Success')
        break

    if current_node is not goal:     

        all_neighbors = M.get(current_node,[]).get('connections',[]) 
        for each_neighbor in all_neighbors:   # each neighbor is a 'set' having a cost and node itself
            if each_neighbor not in closedSet: 

                #gScore = 'cost till current_node' + 'cost between current node and its neighbor' 
                gScore = current_cost + distance(current_node,each_neighbor)

                openSet.put((gScore, each_neighbor))                  
                cameFrom[each_neighbor] = current_node    
                closedSet.add(each_neighbor)

        print(openSet.queue)

list(openSet.queue)
[(30, 'T'), (138, 'S'), (31, 'Z')]
[(31, 'Z'), (138, 'S'), (63, 'LU')]
[(63, 'LU'), (138, 'S'), (65, 'O')]
[(65, 'O'), (138, 'S'), (121, 'M')]
[(121, 'M'), (138, 'S')]
[(138, 'S'), (144, 'D')]
[(144, 'D'), (179, 'F'), (187, 'RV')]
[(179, 'F'), (187, 'RV'), (203, 'C')]
[(187, 'RV'), (203, 'C'), (291, 'B')]
[(203, 'C'), (291, 'B'), (216, 'P')]
[(216, 'P'), (291, 'B')]
[(291, 'B')]
Yes, Success
Out[18]:
[]

Let us modularize as a proper function like BFS/DFS and check again..

In [19]:
from queue import PriorityQueue
from docHelpers_ipython import romania_location_map
from math import floor
import geopy

cameFrom = { }  

def distance(start_node, end_node):

    (x0,y0) = M[start_node]['pos']
    (x1,y1) = M[end_node]['pos']
    return floor(geopy.distance.geodesic((y0,x0), (y1,x1)).miles)

def reconstructPath(current_node):
    Path = [current_node]
    while current_node is not None:
        current_node = cameFrom[current_node] 
        Path.append(current_node)
    return reversed(Path[:-1])   

def UCS(start, goal): 

    # INITIALIZATION
    openSet = PriorityQueue()
    openSet.put((0,start))  
    closedSet = set(start)
    cameFrom[start] = None               

    # MAIN LOOP
    while not openSet.empty():

        current_cost, current_node = openSet.get()     

        if current_node is goal:   
            print('Success. Route from {} to {} found. Cost: {} Path: {}'.format(start,goal,current_cost,list(reconstructPath(current_node))))
            break

        all_neighbors = M.get(current_node,[]).get('connections',[]) 
        for each_neighbor in all_neighbors:      
            if each_neighbor not in closedSet:  

                #gScore = 'cost till current_node' + 'cost between current node and its neighbor' 
                gScore = current_cost + distance(current_node,each_neighbor)

                openSet.put((gScore, each_neighbor))                  
                cameFrom[each_neighbor] = current_node    
                closedSet.add(each_neighbor)

        #print(openSet.queue)  #just to verify


    return 'No Goal found!'


M = romania_location_map 
start = 'A'
goal = 'B'
result = UCS(start, goal)
Success. Route from A to B found. Cost: 291 Path: ['A', 'S', 'F', 'B']
In [20]:
# one more test

start = 'S'
goal = 'N'
result = UCS(start, goal)
Success. Route from S to N found. Cost: 422 Path: ['S', 'F', 'B', 'U', 'V', 'LA', 'N']

Voila! Our UCS algorithm works. Now we not only have a route, but also total cost involved :)

Visualization (Optional)

We will inject a small code to keep track of bag contents, nodes traversed for sake of visualization. And then render an animation to visualize how nodes were traversed to reach the goal.

In [21]:
from queue import PriorityQueue
from docHelpers_ipython import romania_location_map
from math import floor
import geopy

# VISUALIZATION PURPOSE
from docHelpers_ipython import Doc 
from IPython.display import display, Image

cameFrom = { }  

def distance(start_node, end_node):

    (x0,y0) = M[start_node]['pos']
    (x1,y1) = M[end_node]['pos']
    return floor(geopy.distance.geodesic((y0,x0), (y1,x1)).miles)

def reconstructPath(current_node):
    Path = [current_node]
    while current_node is not None:
        current_node = cameFrom[current_node] 
        Path.append(current_node)
    return reversed(Path[:-1])   

def UCS(start, goal): 

    # INITIALIZATION
    openSet = PriorityQueue()
    openSet.put((0,start))  
    closedSet = set(start)
    cameFrom[start] = None               

    # MAIN LOOP
    while not openSet.empty(): 

        current_cost, current_node = openSet.get()     

        if current_node is goal:   
            print('Success. Route from {} to {} found. Cost: {} Path: {}'.format(start,goal,current_cost,list(reconstructPath(current_node))))
            break

        # VISUALIZATION PURPOSE
        all_neighbors = reversed(M.get(current_node,[]).get('connections',[]))        
        considered_neighbors = list(set(all_neighbors) - set(closedSet)) # thank you: https://stackoverflow.com/questions/3462143/get-difference-between-two-lists       

        all_neighbors = M.get(current_node,[]).get('connections',[]) 
        for each_neighbor in all_neighbors:      
            if each_neighbor not in closedSet:  

                #gScore = 'cost till current_node' + 'cost between current node and its neighbor' 
                gScore = current_cost + distance(current_node,each_neighbor)

                openSet.put((gScore, each_neighbor))                  
                cameFrom[each_neighbor] = current_node    
                closedSet.add(each_neighbor)

        # VISUALIZATION PURPOSE
        _ = doc.computeGraphs(current_node, considered_neighbors)

    # VISUALIZATION PURPOSE     
    _ = doc.computeGraphs(current_node, [])            
    return 'No Goal found!'


M = romania_location_map 
start = 'A'
goal = 'B'

# VISUALIZATION PURPOSE - called here caz we use doc in ourSearchAlgo
doc = Doc(M) 

result = UCS(start, goal)

# VISUALIZATION PURPOSE
images = doc.render()
display(Image(images[0]),Image(images[1]),Image(images[2]))
Success. Route from A to B found. Cost: 291 Path: ['A', 'S', 'F', 'B']

BFS vs DFS vs UCS Visual Comparison:

Take a moment to compare the BFS, DFS along with UCS, we just finished.

    
    
    
    

Hmm. UCS does not appear to be better in our case compared to DFS, making much more iterations. But in real life scenario, cost should be much more determining factor, to choose next step towards the goal. Isn't it? Can we do something about it? Over to A* next. Also unless lucky, DFS might take much much longer, if happens to choose a different node than one that eventually leads to goal.

By the way, UCS is Uniform Cost Search, even though step costs are very different here. This is because, typically in large graphs, the step costs happen to be more or less same (but total cost could vary still making our choice better), so its "uniform cost". Think of a grid problem (cells in a grid), and each cell can move in 4 directions, one cell at a time. So step cost there is always 1.